Add --inductor flag to example_ds3_pp with FORCE_BALANCED_ROUTING by xmfan · Pull Request #361 · meta-pytorch/autoparallel

xmfan · 2026-03-10T21:35:26Z

Stacked PRs:

Add --inductor flag to example_ds3_pp with FORCE_BALANCED_ROUTING

The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm
offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the
model that makes token-per-expert counts uniform and uses balanced
all-to-all splits, eliminating all data-dependent ops.

The --inductor CLI flag enables both Inductor compilation and forced
balanced routing together.

Authored with Claude.

The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the model that makes token-per-expert counts uniform and uses balanced all-to-all splits, eliminating all data-dependent ops. The --inductor CLI flag enables both Inductor compilation and forced balanced routing together. Authored with Claude. stack-info: PR: #361, branch: xmfan/stack/31

sanketpurandare

Can you add a CI test for this? We can use the same example PP test with --inductor True?

The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the model that makes token-per-expert counts uniform and uses balanced all-to-all splits, eliminating all data-dependent ops. The --inductor CLI flag enables both Inductor compilation and forced balanced routing together. Authored with Claude. stack-info: PR: #361, branch: xmfan/stack/31

The module-level `dispatcher.sharding_propagator = CustomShardingPropagator()` was leaking into other test files (e.g. test_api.py) when run in the same pytest process, causing `aten.copy_` failures because the custom propagator doesn't have rules for ops that the default DTensor propagator handles. test_dtensor.py's two test classes (ImplicitRegistrationTest, DimShardingTest) inherit from DTensorTestBase which uses MultiProcessTestCase -- each test spawns subprocesses that re-import the module. Those subprocesses don't run pytest fixtures, so they need the custom propagator installed at module level. We gate the module-level install on `multiprocessing.current_process().name` to only run in spawned workers, and use a module-scoped autouse pytest fixture to install/restore the propagator in the main process. Authored with Claude. stack-info: PR: #367, branch: xmfan/stack/32

Add _execute_graph() that lazily compiles graph modules with compile_fx_inner on first invocation. Controlled by an inductor kwarg threaded through all _run_* functions. GraphPPRunner accepts inductor=True and propagates it to all GraphPipelineStage instances, which the stage_* action functions read when calling _run_*. Authored with Claude. stack-info: PR: #360, branch: xmfan/stack/30

The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the model that makes token-per-expert counts uniform and uses balanced all-to-all splits, eliminating all data-dependent ops. The --inductor CLI flag enables both Inductor compilation and forced balanced routing together. Authored with Claude. stack-info: PR: #361, branch: xmfan/stack/31

xmfan force-pushed the xmfan/stack/30 branch from 2a9e575 to a6d4e00 Compare March 10, 2026 21:35

xmfan force-pushed the xmfan/stack/31 branch from fa05732 to 43f286c Compare March 10, 2026 21:35

This was referenced Mar 10, 2026

Deduplicate split_di_dw_graph by reusing upstream _extract_fwd_bwd_modules #359

Merged

Add lazy Inductor compilation to graph_pp_runner #360

Merged

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 10, 2026

xmfan changed the base branch from xmfan/stack/30 to main March 10, 2026 21:57

xmfan force-pushed the xmfan/stack/31 branch from 43f286c to b98bdb5 Compare March 10, 2026 21:57

xmfan changed the base branch from main to xmfan/stack/30 March 10, 2026 21:57

xmfan requested a review from sanketpurandare March 12, 2026 06:07

xmfan marked this pull request as ready for review March 12, 2026 06:07

sanketpurandare reviewed Mar 16, 2026

View reviewed changes

xmfan changed the base branch from xmfan/stack/30 to main March 16, 2026 23:41

xmfan force-pushed the xmfan/stack/31 branch from b98bdb5 to 1549d3d Compare March 16, 2026 23:41

xmfan changed the base branch from main to xmfan/stack/30 March 16, 2026 23:41

sanketpurandare self-requested a review March 17, 2026 00:23

sanketpurandare approved these changes Mar 17, 2026

View reviewed changes

xmfan changed the base branch from xmfan/stack/30 to main March 17, 2026 00:28

xmfan force-pushed the xmfan/stack/31 branch from 1549d3d to 92e53b4 Compare March 17, 2026 00:28

xmfan changed the base branch from main to xmfan/stack/30 March 17, 2026 00:28

xmfan changed the base branch from xmfan/stack/30 to main March 17, 2026 00:57

xmfan force-pushed the xmfan/stack/31 branch from 92e53b4 to 9e9de7a Compare March 17, 2026 00:57

xmfan force-pushed the xmfan/stack/31 branch from 9e9de7a to 5e022a5 Compare March 17, 2026 00:57

xmfan mentioned this pull request Mar 17, 2026

Scope CustomShardingPropagator to test_dtensor tests via pytest fixture #367

Merged

xmfan changed the base branch from main to xmfan/stack/30 March 17, 2026 00:57

xmfan added 3 commits March 16, 2026 20:46

xmfan changed the base branch from xmfan/stack/30 to main March 17, 2026 03:49

xmfan force-pushed the xmfan/stack/31 branch from 5e022a5 to 44110ea Compare March 17, 2026 03:49

xmfan changed the base branch from main to xmfan/stack/30 March 17, 2026 03:49

xmfan changed the base branch from xmfan/stack/30 to main March 17, 2026 14:34

xmfan changed the base branch from main to xmfan/stack/30 March 17, 2026 14:35

xmfan changed the base branch from xmfan/stack/30 to main March 17, 2026 14:36

Merge branch 'main' into xmfan/stack/31

193ccd0

xmfan merged commit 2d7ab90 into main Mar 17, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --inductor flag to example_ds3_pp with FORCE_BALANCED_ROUTING#361

Add --inductor flag to example_ds3_pp with FORCE_BALANCED_ROUTING#361
xmfan merged 4 commits intomainfrom
xmfan/stack/31

xmfan commented Mar 10, 2026 •

edited

Loading

Uh oh!

sanketpurandare left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xmfan commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanketpurandare left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xmfan commented Mar 10, 2026 •

edited

Loading